Confidence Intervals and Hypothesis Testing
Review:
Review:
A simulation:
N <- 10000 # number of iteration
n <- 16 # sample size
m <- 10 # mean
s <- sqrt(9) # SD
alpha <- 0.05 # (1-confidence level)
un <- nw <- matrix(NA, nrow =N, ncol =2) # 2 blank matrices
evaluate <- evaluate.true <- rep(FALSE, N) # 2 blank vectors
in.CI <- function(x){ (x[1]<=m & m<=x[2])}
# Define a function called in.CI. The input x is a 2-element vector, representing an interval. If m is within the interval, in.CI return TRUE, otherwise returns FALSE.
for (i in 1:N){ # loop starts
Sample <- rnorm(n, m, s) # generate normal variates with given parameters
un[i,] <- c(mean(Sample) - (-1)*qt(alpha/2, df = n - 1)*sd(Sample)/sqrt(n), mean(Sample) + (-1)*qt(alpha/2, df = n - 1)*sd(Sample)/sqrt(n) )
# Calculate the i-th confidence interval for estimated SD
nw[i,] <- c(mean(Sample) - (-1)*qnorm(alpha/2)*s/sqrt(n), mean(Sample) + (-1)*qnorm(alpha/2)*s/sqrt(n) )
# Calculate the i-th confidence interval for known SD
evaluate[i] <- in.CI(un[i,])
# m is contained in 1st CI when SD unknown?
evaluate.true[i] <- in.CI(nw[i,])
# m is contained in 2nd CI when SD known?
}
sum(evaluate == FALSE)/N # count and make a ratio sum(evaluate.true == FALSE)/N # count and make a ratio"#> [1] 0.0506
conf_int <- function(n = 100, mean = 0, sd = 1){
sample <- rnorm(n = n, mean = mean, sd = sd)
test <- t.test(sample)
result <- broom::tidy(test) |>
select(estimate, conf.low, conf.high, p.value)
return(result)
}
set_intervals <- function(sample = 100, n = 100, mean = 0, sd = 1){
intervals <- map_dfr(1:sample, ~ conf_int(n = n, mean = mean, sd = sd))
intervals <- intervals |>
mutate(id = 1:n(),
result = ifelse(sign(conf.low) == sign(conf.high), "reject", "accept")) |>
relocate(id)
return(intervals)
}
set.seed(1111)
intervals <- set_intervals(sample = 20,
n = 20)
intervals |>
ggplot(aes(estimate, id, color = result)) +
geom_point() +
geom_segment(aes(x = conf.low, y = id, xend = conf.high, yend = id, color = result)) +
geom_vline(xintercept = 0,
linetype = "dashed")A statistical hypothesis is a claim about the value of a parameter.
In any hypothesis-testing problem, there are two contradictory hypotheses to consider: null-hypothesis (\(H_0\)) and alternative hypothesis (\(H_a\)).
Based on that, we create our null model.
Null model | Significance level
Null model | Significance level | P-value
Null model | Significance level | P-value
When we increase the \(n\), the null model distribution becomes narrower.
ANOVA
In this case, our hypothesis involves multiple means.
\[H_0: \mu_1 = \mu_2 = \cdots = \mu_n, \,\,\, n \geq 3.\]
\[H_1: \text{at least one mean is different. }\]
Assumptions and requirements:
If p-value > \(\alpha\):
If p-value < \(\alpha\):